Cyclic Segmented Parallel Prefix
نویسندگان
چکیده
The cyclic segmented parallel prefix (CSPP) circuit is a varation on parallel prefix. Whereas ordinary parallel prefix computes prefix sums of a vector from the beginning, CSPP allows the starting point to move arbitrarily, with the data “wrapping around.” The wraparound is widely useful. We have used CSPP to redesign many components of a superscalar processor to run in time logarithmic in the number of inputs. Parallel-Prefix circuits and Segmented-Parallel-Prefix circuits are well understood. See for example, [1] for a discussion of log-depth Parallel-Prefix circuits. Segmented-Parallel-Prefix circuits are in an exercise of [1], and were implemented in the CM-5 supercomputer [7, 5, 4, 2]. The rest of this paper is organized as follows. Section 1 reviews the parallel-prefix problem along with the standard solutions. Section 2 reviews the standard log-depth circuits for solving parallel prefix. Section 3 reviews segmented parallel prefix. Section 4 discusses a few minor variations on parallel prefix. Section 5 describes the cyclic segmented parallel prefix problem. Section 6 discusses more minor varations. Section 7 shows some examples.
منابع مشابه
Evaluation of BER in CDMA with Parallel Interference Cancellation
In this paper, we present the performance of a parallel interference cancellation (PIC) scheme in Code division multiple access with cyclic prefix (CP-CDMA) systems and proposed a new model for cyclic prefix code division multiple access with less complexity and good resistance to near-far effect. This method is mostly used for broadband wireless communication in the uplink. Parallel interferen...
متن کاملReal-Time Recognition of Cyclic Strings by One-Way and Two-Way Cellular Automata
This paper discusses real-time language recognition by 1dimensional one-way cellular automata (OCAs) and two-way cellular automata (CAs), focusing on limitations of the parallel computation power. To clarify the limitations, we investigate real-time recognition of cyclic strings of the form uk with u ∈ {0, 1}+ and k ≥ 2. We show a version of pumping lemma for recognizing cyclic strings by OCAs,...
متن کاملOptimizing Parallel Prefix Operations for the Fermi Architecture 3
The NVIDIA Fermi GPU architecture introduces new instructions designed to facilitate basic, but important, parallel primitives on per-thread predicates, as well as instructions for manipulating and querying bits within a word. This chapter demonstrates the application of these instructions in the construction of efficient parallel algorithm primitives such as reductions, scans, and segmented sc...
متن کاملA High Performance Parallel IP Lookup Technique Using Distributed Memory Organization and ISCB-Tree Data Structure
The IP Lookup Process is a key bottleneck in routing due to the increase in routing table size, increasing traıc and migration to IPv6 addresses. The IP address lookup involves computation of the Longest Prefix Matching (LPM), which existing solutions such as BSD Radix Tries, scale poorly when traıc in the router increases or when employed for IPv6 address lookups. In this paper, we describe a ...
متن کاملA High Performance Parallel IP Lookup Technique Using Distributed Memory Organization and ISCB-Tree Data Structure
The IP Lookup Process is a key bottleneck in routing due to the increase in routing table size, increasing traıc and migration to IPv6 addresses. The IP address lookup involves computation of the Longest Prefix Matching (LPM), which existing solutions such as BSD Radix Tries, scale poorly when traıc in the router increases or when employed for IPv6 address lookups. In this paper, we describe a ...
متن کامل